A document image analysis system on parallel processors

نویسندگان

  • Shamik Sural
  • P. K. Das
چکیده

This paper presents a document image processing system implemented on a set of parallel processors. A preprocessing stage is first used to correct skew from scanned document images. The corrected image is segmented and labelled in a two-step Minimum Containing Rectangle (MCR) detection stage. Text Block Filtering (TBF) is then done heuristically and the filtered blocks are submitted to a Multi-Layer Perceptron (MLP) for recognition of characters. Smoothing of the document image is done during MLPbased character recognition to reduce the preprocessing time. It also reduces the formation of merged characters, a main source of recognition errors in conventional approaches. The MLP identifies the bold words during recognition which are used for automatic indexing of documents. Data is partitioned exploiting the inherent parallelism in a document image data. Communication overhead is small compared to the computation time so that a high degree of parallelization is achieved, reducing the total execution time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speeding up the Stress Analysis of Hollow Circular FGM Cylinders by Parallel Finite Element Method

In this article, a parallel computer program is implemented, based on Finite Element Method, to speed up the analysis of hollow circular cylinders, made from Functionally Graded Materials (FGMs). FGMs are inhomogeneous materials, which their composition gradually varies over volume. In parallel processing, an algorithm is first divided to independent tasks, which may use individual or shared da...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

The Versatile Image Processor V. I. P. (Hardware Design)

This paper presentv the architecture of a medium-grain parallel processor well suited for image analysis. The processor, named V.I.P. is composed by clusters of 4 Intel i860 RlSC processors connected among themselves and to I/O units through a parallel bus in the industrial standard VME, a parallel custom Video Rus and a serial network. The processors operate concurrently on cluster and system ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997